# Mathematical Reasoning Reinforcement Learning
Nano Aha Moment 3b
A 3-billion-parameter language model trained with reinforcement learning for solving mathematical reasoning tasks, especially countdown games.
Large Language Model
Transformers

N
McGill-NLP
55
2
OREAL 32B SFT
Apache-2.0
OREAL-32B-SFT is a supervised fine-tuned model based on Qwen2.5-32B, specifically designed for mathematical reasoning tasks, serving as the initial policy model for the OREA reinforcement learning framework.
Large Language Model
Transformers

O
internlm
18
5
Featured Recommended AI Models